release: v0.8.65 (provider/route + Fleet epics, hardening, version bump) by Hmbown · Pull Request #3544 · Hmbown/CodeWhale

Hmbown · 2026-06-24T07:02:22Z

Summary

The v0.8.65 release — workspace bumped 0.8.64 → 0.8.65, with the provider/route + Fleet epic work landed alongside the release hardening. Every change was re-verified against current main and integrated commit-by-commit with the gate green at each step. Full notes in CHANGELOG.md [0.8.65]; overnight detail in scratchpad/v0.8.65-release-handoff-2026-06-24.md.

Epics advanced (real, tested)

Provider/route (v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608): client now built from the resolved ReadyRouteCandidate (v0.8.65: Resolve every provider/model switch through a ReadyRouteCandidate #3384); honest per-token pricing flows to candidates (v0.8.65: Provider/offering usage and PricingSku engine with provenance #3085); a committed Models.dev-shaped bundled catalog gives models real context windows (v0.8.65: Provider-owned live catalogs and secret-free model cache #3385); route-aware context budgets (v0.8.65: Resolved-route context budget service for windows, output caps, compaction, and UI pressure #3086); normalized usage incl. Responses cache-miss/reasoning (v0.8.65: Normalize provider usage telemetry for tokens, cache, reasoning, and quota #2961); provider-blind reasoning bugs fixed.
Fleet (v0.8.65 EPIC: Fleet execution substrate for profiled workers #3154): receipts persist the resolved route + ledger assertion (v0.8.65 EPIC: Fleet execution substrate for profiled workers #3154/v0.8.65: Fleet route parity smoke, soak, and handoff proof #3166).
Providers: dashboard maturity marker + open-models action (v0.8.65: /provider readiness dashboard from route/catalog projections #3083/v0.8.65: OpenAI Codex/ChatGPT OAuth route verification and usage display #2984); auth-aware fallback eligibility (v0.8.65: Capability-aware provider fallback chain with visible route switching #2574); user-defined OpenAI-compatible custom providers (v0.8.65: Custom provider endpoints, models, and auth within provider-scoped routing #1519); insecure-HTTP advisory (v0.8.65: Custom provider endpoints, models, and auth within provider-scoped routing #1519).
Config/mode: config.rs leaf-split behind a pub use facade (v0.8.65: Split config modules around provider/model/catalog boundaries #3311); mode-vs-permission policy via base_policy_for_mode (advisory review-intent preserved per d7d7c714e) (v0.8.65: Untangle Plan/Agent/YOLO mode cycling from permission policy #3386).
Docs: README architecture end-cap (v0.8.65 end-cap: Rewrite README with CodeWhale history, Fleet, and provider routing map #3087); DeepSeek-Anthropic comparison report (v0.8.65: DeepSeek Anthropic-compatible endpoint wire-protocol spike #2963, decision pending live numbers).
Hardening (from the bug audit): strict clippy -D warnings, deterministic web facts build, metadataBase, digest page robustness, cargo audit clean, de-hardcoded repo guidance.

Deferred to 0.8.66 (honest, with reasons — see handoff)

#3205 router de-hardcoding (high-risk runtime auto-routing), #3478 visible card re-anchor (needs ui.rs hot-path follow-up; proven infra on branch codex/issue-3478-…), #3075 model-picker catalog rows (now unblocked by #3385), #1519 arbitrary-named distinct identities, live-number verification for #2963/#2984 (need creds). #3494 dropped per maintainer.

Testing

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --locked -- -D warnings
cargo test — lib/protocol/cli/whaleflow/state + codewhale-tui --bins (5275 passed, 0 failed)
cargo build --release -p codewhale-cli -p codewhale-tui; codewhale --version → 0.8.65
./scripts/release/check-versions.sh (0.8.65 consistent) · cargo audit clean · check-provider-registry.py · web lint/test/check-facts/build
Manual TUI QA (six-worker fanout, multi-terminal route isolation, queued steering) — per release-qa-sweep, needs a human run

Checklist

Updated docs/comments (CHANGELOG, README, AGENTS/CLAUDE)
Added/updated tests for every code slice
Verified TUI behavior manually — automated tests cover the slices; live-TUI QA pending
Co-author credit uses GitHub numeric noreply (no bot trailers; harvested community PRs handled separately — see handoff)

Does not tag, publish, or merge to main — those remain maintainer-gated.

🤖 Generated with Claude Code

…op unused import) Audit #2/#9/#10 (scratchpad/bug-audit-2026-06-24.md): - Parse each KV digest entry independently so one malformed record can no longer blank the whole archive (#9). - Accept locale params and localize empty-state + archive chrome for /zh (#10). - Add a proper generateMetadata via buildPageMetadata (title/desc + metadataBase for the route) and drop the unused next/link import that tripped lint (#2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Audit #3 (scratchpad/bug-audit-2026-06-24.md): `npm run build` ran derive-facts.mjs which always stamped a fresh generatedAt into the tracked web/lib/facts.generated.ts, dirtying the working tree on every clean build. Preserve the committed generatedAt when every *checked* fact (the same set the drift gate compares, excluding generatedAt + runtime-only latestRelease) is unchanged, so a clean rebuild leaves the tracked file byte-identical. A real fact change still stamps a fresh timestamp. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Audit #4 (scratchpad/bug-audit-2026-06-24.md): Next build warned that metadataBase was unset and fell back to http://localhost:3000 for social image resolution on root-segment routes (/_not-found and the root opengraph-image), which never inherited the per-locale layout's metadata. Add a minimal root app/layout.tsx that supplies metadataBase for every route; the per-locale <html>/<body>, fonts, and content metadata stay in app/[locale]/layout.tsx. Build now emits no metadataBase warning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Audit #1 (scratchpad/bug-audit-2026-06-24.md): `cargo clippy --workspace --all-targets --locked -- -D warnings` failed with 9 lints. Fixes: - field_reassign_with_default: build ProvidersConfig with struct-update syntax in the deepseek_anthropic test helper (client.rs). - needless_borrow / needless_borrows_for_generic_args: drop three borrows in session_picker + widgets composer tests. - await_holding_lock: document why the env lock is intentionally held across the child-process await in the secret-env isolation test (js_execution.rs). - print_stderr: localized allow for the test-only pandoc skip diagnostic, which trips the module-wide deny meant for prod code. - too_many_arguments (x3): narrow, documented allows on the two SSE parsers (shared mutable parser-state set on the hot streaming path) and auto_review_plan_decision (mirrors AutoReviewContext::from_tool_call). Gate now green; cargo fmt --check clean; touched tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Audit #8 (scratchpad/bug-audit-2026-06-24.md): the Fleet setup planner tracked a selected role/model but Enter/g/G always inserted the same hard-coded reviewer.toml authoring prompt, so the selection had no functional outcome. Build the profile prompt on demand from the live selection: the Role lane picks the profile file stem + role_hint, and the Model lane maps to model_class_hint (fast/balanced/deep-reasoning/tool-heavy/inherit). Adds a regression test that navigating to builder + fast yields builder.toml with the matching hints. Audit #7 (modal i18n) is intentionally deferred to the #3167 interactive-picker rework (documented in the module header) to avoid translating ~90 volatile technical strings that the redesign will churn; CmdFleetDescription is already localized and the selection wiring above is locale-independent. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@Hmbown

Audit #5 (scratchpad/bug-audit-2026-06-24.md): AGENTS.md and CLAUDE.md still named codex/v0.8.63-integration, workspace 0.8.63, and milestone v0.8.63, which rot between release lanes and mislead release work (README was already current at 0.8.64). Replace the hard-coded branch/version/milestone with live-truth guidance (read version from Cargo.toml, confirm the lane from the active handoff + git, list the milestone via gh). Harvests the framing from draft PR #3452 by @Hmbown while KEEPING the guardrails that PR dropped (no-speculative spawn_blocking, sole base prompt, agent-only sub-agent surface, approval gates, known-flaky papercuts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Audit #11 (scratchpad/bug-audit-2026-06-24.md): cargo audit reported three unmaintained-crate warnings (RUSTSEC-2024-0388 derivative, RUSTSEC-2025-0057 fxhash, RUSTSEC-2024-0436 paste), all transitive only via the starlark 0.13.0 family used by crates/tui and crates/whaleflow. None is a vulnerability and none is fixable without an upstream starlark release. Add .cargo/audit.toml ignoring exactly those three IDs with annotated rationale and a removal trigger, so cargo audit stays clean and surfaces genuinely new advisories. cargo audit now reports no warnings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

claude · 2026-06-24T07:02:38Z

Claude finished @Hmbown's task in 5m 10s —— View job

PR Review: fix(release): v0.8.65 release hardening (audit findings)

Status: Complete

Todo:

Gather context (PR metadata, changed files)
Review TUI changes (fleet_setup.rs, client.rs, chat.rs, engine.rs)
Review web changes (digest/page.tsx, layout.tsx, derive-facts.mjs)
Review build/config changes (.cargo/audit.toml, AGENTS.md, CLAUDE.md)
Compile final review summary

Summary

Solid release-hardening batch. The correctness fixes are on point, the clippy suppressions are properly justified, and the fleet-setup regression test covers the three key assertions. A few minor concerns below — none blocking.

`crates/tui/src/tui/views/fleet_setup.rs` — Fix #8 (main change)

Good: Switching from a stored profile_prompt: String (stale at construction) to a lazily-computed profile_prompt() -> String (reflects current selection) is the right design. Storing snapshot in the struct and recomputing on demand is clean.

Test: The new selected_role_and_model_class_drive_generated_profile test navigates correctly:

3× Down in Role lane: manager(0)→main(1)→scout(2)→builder(3) ✓
Right + 2× Down in Model lane: current-route(0)→inherit(1)→fast(2) ✓

Minor — model_class_hint can't produce "code" or "review": The prompt string in profile_authoring_prompt lists these as valid model_class_hint values, but the match arm in model_class_hint() can't reach them (no Model lane row has a "code" or "review" label; they'd all fall through to "inherit"). Either add rows for these class hints or remove them from the schema doc string to avoid misleading the generating LLM.

// fleet_setup.rs ~L651 — "code" and "review" listed in the prompt but unreachable here
fn model_class_hint(label: &str) -> &'static str {
    match label {
        "fast" => "fast",
        "balanced" => "balanced",
        "strong" => "deep-reasoning",
        "deep-reasoning" => "deep-reasoning",
        "tool-heavy" => "tool-heavy",
        _ => "inherit",   // "code" / "review" fall here, never returned
    }
}

Fix this →

Nit — variable shadowing in selected_label: the closure param lane shadows the outer lane: usize parameter. Harmless, but a rename to fleet_lane would clarify the type change.

fn selected_label(&self, lane: usize) -> Option<&str> {
    self.lanes
        .get(lane)
        .and_then(|lane| lane.rows.get(row))  // `lane` is now FleetSetupLane, not usize
        ...
}

Provider/model safety — Clean ✓: model_class_hint only affects the TOML content being authored, not provider selection. The current route is explicitly passed as context-only ("Current route context only: provider = ...").

`web/app/layout.tsx` — Fix #4 (metadataBase)

Potential Next.js constraint: Next.js technically requires the root layout to include <html> and <body> tags. The current implementation returns children directly:

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return children;
}

This works in practice because app/[locale]/layout.tsx provides <html>/<body> for all app routes, and the build passes cleanly. However, it's a deviation from the documented requirement. If a new non-locale route is added at the root segment (e.g., app/some-page/page.tsx), it will render without a proper HTML document skeleton since the root layout doesn't provide one and the [locale] layout won't wrap it. Low risk given the current route structure, but worth noting for future additions.

`web/scripts/derive-facts.mjs` — Fix #3 (deterministic prebuild)

Logic is correct. The VOLATILE set mirrors the drift gate's ignore list, so the comparison is consistent with what CI checks.

Nit — regex fragility: The regex /export const FACTS\s*:\s*\w+\s*=\s*([\s\S]*?);?\s*$/ extracts the committed JSON object. The lazy [\s\S]*? combined with the optional ;? is correct for the current generated file format, but it would silently return null if the file ends with content after the FACTS declaration. Since the file is generated by the same script this is a low risk; the failure mode is safe (falls back to fresh timestamp). A minor hardening option would be to anchor more tightly (e.g., ([\s\S]+)\s*;\s*$ after the =), though not strictly necessary.

Clippy suppressions (`client.rs`, `chat.rs`, `engine.rs`, `js_execution.rs`, `pandoc.rs`)

All five suppressions are well-justified:

field_reassign_with_default in client.rs — correct struct-literal form
too_many_arguments on SSE hot paths and auto_review_plan_decision — struct bundling would add reborrow noise; comments explain the rationale
await_holding_lock in js_execution.rs test — intentional; the lock must span the await to prevent env races
print_stderr in pandoc.rs test helper — test-only skip, clearly scoped

No concerns here.

`web/app/[locale]/digest/page.tsx` — Fix #2

Per-entry try/catch with flatMap is the correct resilience pattern. The isZh binary locale check is appropriate for the current two-locale setup. The generateMetadata export is a clean addition.

`.cargo/audit.toml` — Fix #11

Suppressing three starlark-transitive "unmaintained" advisories is appropriate. The rationale and removal trigger are documented inline. The cargo tree -i verification claim in the comment is the right evidentiary standard.

`AGENTS.md` / `CLAUDE.md` — Fix #5

The de-hardcoding is the right call — "read from Cargo.toml" and "check the live milestone" are better instructions than a version number that will be stale by the next release cycle.

Minor correctness items in other TUI files

session_picker.rs / widgets/mod.rs — needless-borrow removals (&selected_id → selected_id, contains(&...) → contains(...)) are correct clippy fixes.
client.rs field-reassign-with-default — clean, no behavioral change.

Overall: The audit findings are addressed correctly. The one actionable item is the "code" / "review" gap in model_class_hint — the rest are nits or low-priority hardening notes. Ready to merge once the manual TUI QA pass (noted in the checklist) is complete.

…2961) Slice 1 (#3086): make the context report's context window route-aware and collapse the duplicated pressure-threshold copy. - ReportBuilder::finish now takes the provider + active route limits and uses route_budget::route_context_window_tokens, so a resolved route's context window overrides the bare model default in the report. - build_context_report threads app.api_provider + app.active_route_limits; the headless doctor path uses provider_capability(...).context_window (route limits are unresolved headless) with an explanatory comment. - pressure_label now delegates to context_budget::PressureLevel so the diagnostic label can no longer drift from the unified thresholds (the "high" boundary moves 70% -> 75%, matching the compaction trigger). The blanket dead_code allow in context_budget.rs is retained: only from_usage_percent and label gain a non-test consumer; ContextBudget and suggests_compaction are still pending their engine/TUI wiring. Slice 2 (#2961, parser only): stop hardcoding Responses usage fields as None. - parse_responses_usage now derives prompt_cache_miss_tokens as input minus the cached hit when cached input tokens are reported, and reads reasoning tokens from output_tokens_details.reasoning_tokens, mirroring the Chat-Completions parser. Tests: route window overrides model default in the report; pressure label matches PressureLevel boundaries; Responses usage surfaces cache-miss and reasoning when present and stays None otherwise. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…1519) Light up real pricing on resolved route candidates and add a loopback-exempt insecure-http advisory, both in the route resolver. #3085 (pricing keystone): the resolver hardcoded `Some(PricingSku::UnknownOrStale)` on every candidate because `ProviderModelOffering` (the type the resolver consumes) carried no cost, and `route_pricing_sku` takes a `CatalogOffering` the resolver never holds. Thread a projected `PricingSku` onto `ProviderModelOffering`, populated where the sourced cost is in scope: - `CatalogOffering::to_offering` projects via `route_pricing_sku(self)`. - The Models.dev offering builders project via a new `route_pricing_sku_from_cost` helper (same honesty rule, raw cost input). - `bundled_offerings` (no sourced cost) stays `UnknownOrStale`. The resolver now carries the matched offering's pricing onto the candidate and keeps `UnknownOrStale` on every branch with no matched offering. No price is ever fabricated (the #2608/#3085 honesty rule). `ProviderModelOffering` drops its `Eq` derive (PricingSku::Token holds f64; still `PartialEq`); `PricingSku` gains `PartialEq`. #1519 (insecure-http warning): after the endpoint is built, push an advisory "endpoint uses insecure http:// (credentials sent in plaintext)" message when the base URL is non-loopback `http://`. Loopback (localhost / 127.0.0.0/8 / ::1) is exempt so local Ollama/vLLM/SGLang defaults stay clean. Advisory only: `validation.ok` stays true. Host parsing is dependency-free (no url crate). Tests: priced_offering_yields_token_pricing_sku, unpriced_offering_stays_unknown, http_custom_endpoint_emits_insecure_warning, loopback_http_endpoint_does_not_warn, https_endpoint_has_no_warning.

Route-consumption keystone, slices A + B. Surgical and byte-identical for normal configs today; closes two provider-blind reasoning seams. Slice A — construct the client FROM the candidate - client.rs: add `DeepSeekClient::from_candidate(config, candidate)` beside `new`. Both now share a private `from_parts(base_url, default_model, config)` helper so they cannot drift. `from_candidate` overrides base_url <- candidate.endpoint.base_url and default_model <- candidate.wire_model_id; the API key and provider still come from `Config` because `ReadyRouteCandidate` is secret-free by design. - Switch the three call sites that already hold a candidate: engine.rs `activate_runtime_route`, and ui.rs `switch_provider` + `apply_provider_fallback_switch`, all to `from_candidate(&cfg, &route.candidate)`. The candidate is a partial-move-safe field of the resolved route, still live after `route.config` is taken. Slice B — fix two provider-blind reasoning seams - model_routing.rs `resolve_explicit_route_with_inventory`: both arms resolved effort with a bare `ReasoningEffort::from_setting`, ignoring the candidate's provider. Wrap each with `normalize_auto_route_effort_for_provider(candidate.provider, ...)` so an explicit route to a non-active provider gets that provider's effort floor, not the active provider's raw setting. - turn_loop.rs auto path: `resolve_auto_effort` applied the selected tier with a provider-blind `as_setting()`. Thread `self.api_provider` in and normalize via `normalize_auto_route_effort_for_provider`. Tests - client.rs: from_candidate_uses_candidate_base_url_and_wire_model and from_candidate_matches_new_when_config_agrees (candidates minted via the RouteResolver-backed resolve_runtime_route, the sole producer). - model_routing.rs: explicit_route_to_nonactive_provider_uses_that_providers_effort (active=deepseek, explicit GLM-5.2 routes to Z.ai with effort normalized low -> high). Deferred to 0.8.66 (explicit non-goals): threading the candidate into MessageRequest/create_message_stream/turn-loop dispatch, adding a reasoning field to the candidate, and model_inventory router hard-coding.

Persist an additive, plain-strings resolved-route snapshot on the Fleet receipt so ledgers record which provider/model/protocol a task resolved to. Closes the gap where ReadyRouteCandidate carried this detail but the ledger dropped it. Slice (#3154): - Add `FleetResolvedRoute` to codewhale-protocol (provider_id, provider_kind, canonical_model, wire_model_id, protocol, role, loadout, source). Plain strings only — no codewhale-config route type dependency. - Add `FleetReceipt.resolved_route: Option<FleetResolvedRoute>` behind `#[serde(default)]` so pre-existing ledgers still deserialize. - Thread the route through `FleetTaskVerificationInput` and populate it in both receipt builders (task_spec verification path and the manager simulated/transport fallback) from a single mint per task. - Mint via the existing hermetic resolver bridge (`route_runtime::resolve_route_candidate`) in `worker_runtime`, reusing the effective fleet role/loadout. `canonical_model` stays honest-None when the resolver cannot pin one; no reasoning/pricing fields invented. No-secrets invariant (#3154): - `FleetResolvedRoute` has no field that can hold a credential. Tests assert the serialized receipt/route contains no api_key/bearer/sk-*/ auth-token/secret markers. Assertion (#3166 scope #10): - Extend the landed 10-task smoke to assert every receipt carries a resolved route with non-empty provider/wire_model_id, a role, and source == "resolver", plus a no-secrets scan over each serialized receipt. Tests: round-trip, legacy back-compat (missing field), no-secrets, and resolver-mint coverage. cargo test -p codewhale-protocol and the codewhale-tui fleet suite pass; new code is clippy-clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two small provider-dashboard slices: - #2984: add a typed `ProviderMaturity { Experimental, Supported }` marker, separate from `ProviderReadiness` (which tracks auth/route state). Seed it from a per-provider table (OpenaiCodex => Experimental, everything else Supported), carry it on `ProviderDashboardRow`, and surface an `experimental` tag in the compact hint only for experimental providers so the common case stays noise-free. - #3083: add an `M`/`m` action in the provider picker that emits a new `ViewEvent::ProviderPickerOpenModels { provider }`. The ui.rs handler opens the `/model` picker via the existing path and pre-filters it to the highlighted provider by seeding the picker's search query with the provider display name (reusing the model picker's existing provider-name scoping; no model_picker internals touched). Footer hint gains `M models`. Tests cover the maturity marker/tag for OpenaiCodex vs Deepseek and the `m`/`M` key emitting ProviderPickerOpenModels for the highlighted provider.

…3385) The default RouteResolver::new() previously sourced only the 4-row hand seam (deepseek/together/openrouter), so the picker/candidates had real route facts for almost nothing and fell back to RouteLimits::default() (unknown) for every other provider/model. bundled_offerings_from_models_dev() existed but was fed only by test fixtures — there was no committed catalog asset. This commit: - Adds crates/config/assets/models_dev.bundled.json, a network-free Models.dev-shaped snapshot matching the ModelsDevCatalog deserialization shape (models_dev.rs). It is CURATED from in-repo verified facts rather than a live models.dev dump: context windows / output caps come from crates/tui/src/models.rs and USD-per-million pricing from crates/tui/src/pricing.rs. The public models.dev catalog tracks a different real model generation than CodeWhale's curated forward-dated set, so a live transform would disagree with the repo's own model registry and tests; curated-but-accurate was preferred per the issue. Coverage: 13 providers (deepseek, zai, moonshot, minimax, openai, anthropic, openrouter, together, fireworks, novita, siliconflow, arcee, xiaomi-mimo), 27 chat offerings. Pricing is omitted where the repo has no trustworthy per-token rate (DeepSeek-native rows, aggregator-hosted DeepSeek, Anthropic, MiMo Token Plan) so it surfaces honestly rather than as a fabricated zero. - Adds catalog loaders BUNDLED_MODELS_DEV_JSON (include_str!), bundled_models_dev_catalog(), and bundled_catalog_offerings(). - Wires RouteResolver::new() to merge the asset rows UNDER the hand seam: the seam keeps precedence on a (provider, wire id) collision so the curated canonical-model joins and the deliberately-unpriced DeepSeek-native entries the route invariants assert are preserved, while asset-only rows (GLM, Kimi, MiniMax, Qwen, …) now flow real context windows to candidates. Each provider's default row uses its built-in DEFAULT_*_MODEL wire id so the descriptor-conformance default-route test stays green. bundled_offerings_from_models_dev keeps its signature. New tests: the asset parses and deserializes; bundled offerings expose real chat facts; pricing is honest; and RouteResolver::new() resolves GLM/Kimi to real (non-default) context windows. cargo test -p codewhale-config and cargo clippy -p codewhale-config --all-targets -- -D warnings are green.

Integration follow-up: #3385's default_resolver test asserted glm-5.1 pricing stays UnknownOrStale (true on #3385's pre-#3085 base). On the release branch the #3085 keystone projects the asset's provider-scoped cost onto the candidate via route_pricing_sku, so the priced Z.ai row now carries a real PricingSku::Token. Update the assertion to the integrated reality.

…llback (#2574) #3386: Untangle mode cycling from permission policy. Introduce `ModeSessionPrefs` (the durable Agent-era baseline that Plan/YOLO derive from and restore to) and a pure `base_policy_for_mode(mode, prefs) -> EffectiveModePolicy` implementing the mode table: Plan = read-only / no-shell / Suggest, Agent = the baseline, YOLO = shell + trust + Auto. `set_mode` now refreshes the baseline from the live mirrors when leaving Agent, then derives and applies the incoming mode's policy in one block. This subsumes the ad-hoc YoloRestoreState/PlanRestoreState snapshots so YOLO's elevated authority can no longer bleed into the restored Agent surface. The boolean fields (allow_shell/trust_mode/approval_mode/yolo) stay as derived mirrors — no crate-wide type migration. Behavior is preserved exactly: the shipped advisory review-only behavior is untouched (no review-intent -> Plan downgrade), and the existing yolo/plan/cycle round-trip tests stay green. #2574: Make `advance_fallback` capability-aware. It now walks the chain skipping providers that are not ready (hosted providers missing a key, via the same `has_api_key_for` the picker uses) while local providers (Ollama/vLLM/SGLang) are always eligible, appending a clear "skipped <p>: needs auth" note per skip and a reason on exhaustion. Readiness is captured into a per-provider snapshot at startup; `ProviderChain::advance` stays pure. 401s already bypass fallback at the call site, so a bad key still does not silently rotate providers. Tests: base_policy_for_mode table; set_mode round-trips (Agent->YOLO-> Agent, Plan->YOLO->Agent, edited-baseline restore); advance_fallback skip/land, all-unready exhaustion, and local-without-key eligibility.

Extract self-contained, behavior-preserving leaves out of the ~6.7k-line crates/tui/src/config.rs into sibling modules under crates/tui/src/config/, each re-exported behind a `pub use` facade so every existing `crate::config::<symbol>` path resolves unchanged. No call site outside config.rs is touched; entangled credential/provider/ default-model/normalization/route logic stays put. Modules created (config.rs: 6739 -> 6246 lines, ~493 lines moved): - config/models.rs (160): provider model-name + base-URL constants and curated model lists (DEFAULT_*_MODEL/BASE_URL, RECENT_OPENROUTER_*, COMMON/OFFICIAL_DEEPSEEK_MODELS, ...). API_KEYRING_SENTINEL stays in config.rs with the credential logic. - config/search.rs (135): SearchProvider, SearchProviderSource, SearchProviderResolution, SearchConfig (self-contained [search] types). - config/subagent_limits.rs (69): sub-agent concurrency/timeout limit constants + their two private clamp resolvers (pulled back privately, no new external surface). - config/paths.rs (203): pure filesystem path helpers (config/cache/ workspace path resolution, env-var path overrides, ~ expansion); effective_home_dir/expand_path re-exported pub(crate). Workspace-trust and config-load logic stay in config.rs. scripts/check-provider-registry.py now also scans config/models.rs for the default model/base-URL constants so the registry gate follows the split. Verify: config suite 480 passed / 0 failed; full bin suite 5248 passed / 0 failed; cargo build -p codewhale-tui --locked green; clippy clean on all changed files (the 3 remaining -D warnings errors are pre-existing too_many_arguments lints in untouched chat.rs/engine.rs); cargo fmt --all clean; provider registry drift check passes.

… pending live numbers) Adds benchmark_results/deepseek-anthropic-comparison-2026-06-24.md for #2963. The deepseek-anthropic / deepseek-claude route already landed (5b8a5ac / #3449); this is the reporting deliverable, not code. The report: - documents what landed (route, x-api-key + anthropic-version auth, AnthropicMessages wire format, body/SSE/usage parsing) with file:line cites; - records code-derived findings without live calls: server tools / web search are filtered out on encode (anthropic.rs:361) so that capability is not exercised via this route today, and reasoning_tokens / server_tool_use are always null on the Anthropic usage path vs the Chat-Completions parser; - gives the comparison methodology and a copy-pasteable live checklist for a human with DEEPSEEK_API_KEY to fill in latency/token/correctness numbers; - states the decision honestly: keep as Experimental, keep-vs-promote PENDING the live numbers. No verified verdict is fabricated.

Reshape the front page toward the #3087 intended structure and ground every claim in current repo facts. - Identity line is now "the terminal coding agent for any model — open models first," matching docs/PROVIDERS.md and the agent guidance. - New "Providers and routing" section describes the real route system: RouteResolver minting a resolved route (endpoint, wire protocol, model ID, context limit, price), the network-free Models.dev-shaped catalog, route-aware context budgets, and the honest cost states (per-token, subscription/quota, credits, local/N-A, unknown/stale). - Fold sub-agents into a single durable "Fleet" section: ledger at .codewhale/fleet.jsonl, idempotent `fleet resume`, typed receipts, and roles/profiles/loadouts/slots with strong/balanced/fast model classes. - Dedicated "Safety" section: three modes, hooks allow/deny/ask, the actual sandbox backends (Seatbelt/Landlock+seccomp/bwrap), and /restore. - Fix stale facts: drop the "Kimi OAuth temporarily broken" note (a working kimi_oauth auth_mode ships), mark openai-codex experimental, replace the "big vs. cheap" tier wording with strong/balanced/fast, and add the qianfan provider that was missing from the list. - Sync the docs index (Fleet/Sub-agents links) and trim the prose. English README.md only; localized READMEs tracked as follow-up. Install and version tokens (--tag v0.8.64, # 0.8.64) are left unchanged for the check-versions gate.

Comprehensive 0.8.65 release notes covering the provider/route resolution epic (#2608), Fleet substrate (#3154), config modularization (#3311), and the release-hardening + correctness fixes integrated on this branch, plus the v0.8.64..main community work. Compare link added; version bump applied separately via prepare-release.sh.

…slice Add a single dynamic provider identity for arbitrary `[providers.<name>] kind="openai-compatible" base_url api_key_env` tables, routing through the existing OpenAI Chat Completions wire protocol + LocalOrCustom pass-through. Config-driven selection via `provider = "<name>"`. Non-goals (deferred): visual picker integration, per-provider distinct identities, non-openai-compatible kinds. Touchpoints: - config crate: ProviderKind::Custom variant + ALL[28] (provider_kind.rs); registry Custom entry (provider.rs); total ProviderKind matches + ProvidersToml.custom field (lib.rs); conformance env_vars carve-out (tests.rs); explicit-Custom pass-through resolver test (route/tests.rs). - tui crate: ApiProvider::Custom variant + KIND/FROM_KIND lookups, ProviderConfig.kind/api_key_env + is_openai_compatible_custom(), ProvidersConfig flatten map + custom_provider_config(), api_provider() Custom-before-Deepseek safety fix (closes silent misroute), name-keyed provider_config_for[_mut], deepseek_base_url/ default_model/credential_url/model_completion_names/passes_model_through Custom arms, deepseek_api_key api_key_env resolution + error, env/header override arms, merge_provider_config/merge_custom_providers (config.rs); route_runtime preserves the custom name + tests (route_runtime.rs); reasoning-effort Custom arms == Openai (client.rs) + from_candidate test; provider_base_url_table_key + picker auth Custom arms (config_persistence.rs, tui/ui.rs). Tests: config flatten parse + api_provider Custom/Deepseek; route_runtime custom endpoint verbatim model + ChatCompletions + insecure-http advisory; resolver explicit-Custom pass-through; client from_candidate binds custom base_url + model. Provider-descriptor conformance stays green over Custom.

… config split Follow-ons surfaced by integrating #1519 + #3311: - #1519 added the dynamic ApiProvider::Custom meta-provider. Exclude it from the website provider facts (facts-lib.mjs + facts-drift.ts EXCLUDED, like DeepseekCN) and from the provider-table drift check (META_PROVIDER_TABLES in check-provider-registry.py) — it's a runtime catch-all, not a shipped provider. - #3311 moved DEFAULT_TEXT_MODEL's definition to config/models.rs, which broke the web facts' loose scrape (it mis-bound to an NVIDIA base URL). Read both files and match the const definition precisely; default model is deepseek-v4-pro again.

Bumps workspace + internal pins + npm wrapper + Cargo.lock + README install tags + the packaged tui changelog slice + generated web facts to 0.8.65, and finalizes the CHANGELOG [0.8.65] entry (dropped #3478, which is deferred). Via scripts/release/prepare-release.sh; check-versions.sh green.

claude · 2026-06-24T09:13:22Z

Claude finished @Hmbown's task in 4m 30s —— View job

Todo:

Gather PR context
Review diff (correctness, provider/route safety, reuse, tests, security)
Review TUI core changes (fleet_setup.rs, client.rs, engine.rs, app.rs)
Review config crate changes (catalog.rs, provider.rs, route/resolver.rs)
Review web/build changes (layout.tsx, derive-facts.mjs, digest/page.tsx)
Post final review

Review: fix(release): v0.8.65 release hardening (audit findings)

This review covers the full 22-commit diff, including the larger feature commits added since the first review pass.

`crates/tui/src/fleet/worker_runtime.rs` — `resolve_fleet_route` (lines ~194–228)

Route snapshot always shows DeepSeek even for non-DeepSeek users. resolve_fleet_route passes ApiProvider::Deepseek as the hard-coded scope to resolve_route_candidate. The comment explains why ("worker profile carries no provider authority"), but the consequence is that FleetReceipt.resolved_route.provider_id will always serialize as "deepseek" regardless of the session's active provider (Zai, OpenRouter, custom, etc.). This is an audit/observability gap: the persisted receipt misreports the provider for every non-DeepSeek Fleet run.

The fix is straightforward — thread the active ApiProvider (or its ProviderKind) into resolve_fleet_route so it can pass the correct provider scope. Since the receipt is durable, wrong provider_id in existing ledgers won't be correctable without a migration.

// worker_runtime.rs ~L218 — provider_kind hardcoded to default()
let candidate =
    resolve_route_candidate(ApiProvider::Deepseek, model_selector, None, None).ok()?;
//                          ^^^^^^^^^^^^^^^^^^^^^^^^^^ wrong for Zai / OpenRouter / custom users

Fix this →

`crates/tui/src/tui/views/fleet_setup.rs` — `model_class_hint` (carried from first review, still present)

"code" and "review" appear in the profile_authoring_prompt format string as valid model_class_hint values, but neither can be returned by model_class_hint() — they fall through to "inherit". The generating LLM sees these as options but the function constrains to a disjoint set.

// fleet_setup.rs ~L651 — "code" and "review" listed in schema but unreachable
fn model_class_hint(label: &str) -> &'static str {
    match label { "fast" => "fast", "balanced" => "balanced", ... _ => "inherit" }
}
// profile_authoring_prompt says: "one of inherit, fast, balanced, deep-reasoning, code, review, or tool-heavy"
//                                                                               ^^^^  ^^^^^^ unreachable

Fix this →

`crates/config/src/route/resolver.rs` — Provider/Route Safety ✓

The module header and RouteRequest struct are explicit: "no prompt-text / freeform field … the resolver cannot see prompt content, so it cannot silently route on it." The classify() function correctly separates strict-direct (DeepSeek/Zai), aggregator, and local/custom. The test suite in route/tests.rs (+259 lines) directly covers the no-prefix-inference invariant (resolver_no_explicit_provider_does_not_infer_deepseek_from_prefix, resolver_aggregator_preserves_prefixed_wire_id_without_inferring_deepseek). Clean.

api.deepseeki.com in normalize_route_base_url is the China regional endpoint (confirmed in REBRAND.md), not a typo.

`crates/tui/src/client.rs` — `from_candidate` constructor (lines ~657–683) ✓

The design is correct: base_url and wire_model_id come from the resolver candidate; secrets (api_key, provider context) come from route_config, which resolve_runtime_route/prepared_route_config already stamps with the correct provider before from_candidate is called. The two inputs cannot drift because they share the same from_parts body.

`crates/tui/src/tui/app.rs` — Fallback chain tests ✓

The three new tests in app/tests.rs (advance_fallback_skips_unauthed_middle_provider, _local_provider_is_eligible_without_a_key, _all_unready_exhausts_with_clear_reason) cover the three meaningful cases well. One gap: no test for ApiProvider::Custom in the chain — it carries no built-in env var and auth is named per-entry, so the eligibility check is different from other providers.

`crates/config/src/provider.rs` — Custom provider (#1519) ✓

The Custom placeholder defaults (http://localhost/v1, "custom-model") fail closed: a misconfigured entry hits loopback rather than a public host. Auth is resolved from the per-entry api_key_env field in deepseek_api_key() (path 1b, ~L3246), keeping the secret out of config storage. The provider_config_key() returning "custom" is intentional; the note in prepared_route_config explains why the provider string must not be overwritten for the Custom variant.

Refactored `crates/tui/src/config.rs` ✓

The monolith split into models, paths, search, subagent_limits leaf modules is purely mechanical: each leaf re-exports every public item, so crate::config::<CONST> paths are unchanged. No behavioral change. +248/-539 net reduction in the root file.

`web/app/layout.tsx` (Fix #4) — Low-risk deviation ✓

Returning children directly from the root layout is a deviation from Next.js's documented <html>/<body> requirement, but it works in practice because every reachable route goes through app/[locale]/layout.tsx. The risk noted in the first review stands: a future root-segment page (e.g. app/some-page/) won't get an HTML shell, but no such page exists today.

`.cargo/audit.toml`, `AGENTS.md`/`CLAUDE.md`, web `derive-facts.mjs` ✓

All three already reviewed and signed off in the first review pass. No new concerns.

Summary

Two actionable items, one per priority tier:

#	File	Finding	Priority
1	`fleet/worker_runtime.rs:218`	`resolve_fleet_route` hardcodes `ApiProvider::Deepseek`; receipt `provider_id` is wrong for all non-DeepSeek Fleet runs	Correctness
2	`tui/views/fleet_setup.rs:~651`	`"code"` / `"review"` listed in schema docstring but unreachable from `model_class_hint`	Correctness (carried from first review)

Everything else is clean: provider/route safety invariants are enforced and tested, the auth split in from_candidate is correct, the Custom provider fails closed, and the refactor is purely mechanical.

Hmbown and others added 7 commits June 23, 2026 23:50

Hmbown and others added 15 commits June 24, 2026 00:39

Hmbown changed the title ~~fix(release): v0.8.65 release hardening (audit findings)~~ release: v0.8.65 (provider/route + Fleet epics, hardening, version bump) Jun 24, 2026

Hmbown merged commit f1f7982 into main Jun 24, 2026
17 checks passed

Hmbown mentioned this pull request Jun 24, 2026

v0.8.65 EPIC: Separate provider facts, model facts, offerings, and route resolution #2608

Closed

claude Bot mentioned this pull request Jun 24, 2026

docs: reconcile v0.8.65 release ledger to live state #3493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.8.65 (provider/route + Fleet epics, hardening, version bump)#3544

release: v0.8.65 (provider/route + Fleet epics, hardening, version bump)#3544
Hmbown merged 22 commits into
mainfrom
codex/v0.8.65-release-prep

Hmbown commented Jun 24, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hmbown commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Epics advanced (real, tested)

Deferred to 0.8.66 (honest, with reasons — see handoff)

Testing

Checklist

Uh oh!

claude Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(release): v0.8.65 release hardening (audit findings)

Summary

crates/tui/src/tui/views/fleet_setup.rs — Fix #8 (main change)

web/app/layout.tsx — Fix #4 (metadataBase)

web/scripts/derive-facts.mjs — Fix #3 (deterministic prebuild)

Clippy suppressions (client.rs, chat.rs, engine.rs, js_execution.rs, pandoc.rs)

web/app/[locale]/digest/page.tsx — Fix #2

.cargo/audit.toml — Fix #11

AGENTS.md / CLAUDE.md — Fix #5

Minor correctness items in other TUI files

Uh oh!

claude Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: fix(release): v0.8.65 release hardening (audit findings)

crates/tui/src/fleet/worker_runtime.rs — resolve_fleet_route (lines ~194–228)

crates/tui/src/tui/views/fleet_setup.rs — model_class_hint (carried from first review, still present)

crates/config/src/route/resolver.rs — Provider/Route Safety ✓

crates/tui/src/client.rs — from_candidate constructor (lines ~657–683) ✓

crates/tui/src/tui/app.rs — Fallback chain tests ✓

crates/config/src/provider.rs — Custom provider (#1519) ✓

Refactored crates/tui/src/config.rs ✓

web/app/layout.tsx (Fix #4) — Low-risk deviation ✓

.cargo/audit.toml, AGENTS.md/CLAUDE.md, web derive-facts.mjs ✓

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hmbown commented Jun 24, 2026 •

edited

Loading

claude Bot commented Jun 24, 2026 •

edited

Loading

`crates/tui/src/tui/views/fleet_setup.rs` — Fix #8 (main change)

`web/app/layout.tsx` — Fix #4 (metadataBase)

`web/scripts/derive-facts.mjs` — Fix #3 (deterministic prebuild)

Clippy suppressions (`client.rs`, `chat.rs`, `engine.rs`, `js_execution.rs`, `pandoc.rs`)

`web/app/[locale]/digest/page.tsx` — Fix #2

`.cargo/audit.toml` — Fix #11

`AGENTS.md` / `CLAUDE.md` — Fix #5

claude Bot commented Jun 24, 2026 •

edited

Loading

`crates/tui/src/fleet/worker_runtime.rs` — `resolve_fleet_route` (lines ~194–228)

`crates/tui/src/tui/views/fleet_setup.rs` — `model_class_hint` (carried from first review, still present)

`crates/config/src/route/resolver.rs` — Provider/Route Safety ✓

`crates/tui/src/client.rs` — `from_candidate` constructor (lines ~657–683) ✓

`crates/tui/src/tui/app.rs` — Fallback chain tests ✓

`crates/config/src/provider.rs` — Custom provider (#1519) ✓

Refactored `crates/tui/src/config.rs` ✓

`web/app/layout.tsx` (Fix #4) — Low-risk deviation ✓

`.cargo/audit.toml`, `AGENTS.md`/`CLAUDE.md`, web `derive-facts.mjs` ✓